DAY[23]-Kaggle實戰特徵轉換

11th鐵人賽 python3 machine learning

Austin

團隊Bikini Bottom

2019-10-08 04:43:11

2021 瀏覽

分享至

特徵調整

在這裡要使用一個較特殊的運算叫做boxcox，boxcox1p則是加上1之後才做boxcox避免過程中出現錯誤，boxcox的公式如下。

做完這個轉換之後，變數的偏態係數會較接近0，正如我們在前幾個章節所說的，若是特徵可以更接近常態(偏態係數為0)，則模型可以更好的進行預測。

from scipy.stats import boxcox1p
numeric_dtypes = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
numerics2 = []
for i in features.columns:
    if features[i].dtype in numeric_dtypes:
        numerics2.append(i)
skew_features = features[numerics2].apply(lambda x: skew(x)).sort_values(ascending=False)

high_skew = skew_features[skew_features > 0.5]
skew_index = high_skew.index

for i in skew_index:
    features[i] = boxcox1p(features[i], boxcox_normmax(features[i] + 1))